.\"
.\" Copyright (c) 2018-2019, Red Hat, Inc.
.\"
.\" Permission to use, copy, modify, and/or distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND RED HAT, INC. DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL RED HAT, INC. BE LIABLE
.\" FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
.\" OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
.\" CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.\" Author: Jan Friesse <jfriesse@redhat.com>
.\"
.Dd Mar 21, 2018
.Dt SPAUSEDD 8
.Os
.Sh NAME
.Nm spausedd
.Nd Utility to detect and log scheduler pause
.Sh SYNOPSIS
.Nm
.Op Fl dDfhp
.Op Fl m Ar steal_threshold
.Op Fl t Ar timeout
.Sh DESCRIPTION
The
.Nm
utility is used for detecting and logging scheduler pause. This means, when process
should have been scheduled, but it was not. It's also able to use steal
time (time spent in other operating systems when running in a virtualized
environment) so it is (to some extend) able to detect if problem is on the VM
or host side.
.Nm
is able to read information about steal time ether from kernel or (if compiled in)
also use VMGuestLib.
Internally
.Nm
works as following pseudocode:
.Bd -literal -offset indent
repeat:
    store current monotonic time
    store current steal time
    sleep for (timeout / 3)
    set time_diff to (current monotonic time - stored monotonic time)
    if time_diff > timeout:
        display error
        set steal_time_diff to (current steal time - stored steal time)
        if (steal_time_diff / time_diff) * 100 > steal_threshold:
            display steal time error
.Ed
.Pp
.Nm
arguments are as follows:
.Bl -tag -width Ds
.It Fl d
Display debug messages (specify twice to display also trace messages).
.It Fl D
Run on background (daemonize).
.It Fl f
Run on foreground (do not demonize - default).
.It Fl h
Show help.
.It Fl p
Do not set RR scheduler.
.It Fl t Ar steal_threshold
Set steal threshold percent. (default is 10 if kernel information is used and
100 if VMGuestLib is used).
.It Fl t Ar timeout
Set timeout value in milliseconds (default 200).
.El
.Pp
If
.Nm
receives a SIGUSR1 signal, the current statistics are show.
.Sh EXAMPLES
To generate CPU load
.Xr yes 1
together with
.Xr chrt 1
is used in following examples:
.Pp
.Dl chrt -r 99 yes >/dev/null &
.Pp
If chrt fails it may help to use
.Xr cgexec 1
like:
.Pp
.Dl cgexec -g cpu:/ chrt -r 99 yes >/dev/null &
.Pp
First example is physical or virtual machine with 4 CPU threads so
.Xr yes 1
was executed 4 times. In a while
.Nm
should start logging messages similar to:
.Pp
.Dl Mar 20 15:01:54 spausedd: Running main poll loop with maximum timeout 200 and steal threshold 10%
.Dl Mar 20 15:02:15 spausedd: Not scheduled for 0.2089s (threshold is 0.2000s), steal time is 0.0000s (0.00%)
.Dl Mar 20 15:02:16 spausedd: Not scheduled for 0.2258s (threshold is 0.2000s), steal time is 0.0000s (0.00%)
.Dl ...
.Pp
This means that
.Nm
didn't got time to run for longer time than default timeout. It's also visible
that steal time was 0% so
.Nm
is running ether on physical machine or VM where host machine is not overloaded
(VM was scheduled on time).
.Pp
Second example is a host machine with 2 CPU threads running one VM. VM is running
an instance of
.Nm . Two instancies of
.Xr yes 1
was executed on the host machine. After a while
.Nm
should start logging messages similar to:
.Pp
.Dl Mar 20 15:08:20 spausedd: Not scheduled for 0.9598s (threshold is 0.2000s), steal time is 0.7900s (82.31%)
.Dl Mar 20 15:08:20 spausedd: Steal time is > 10.0%, this is usually because of overloaded host machine
.Dl ...
.Pp
This means that
.Nm
didn't got the time to run for almost one second. Also because steal time is
high, it means that
.Nm
was not scheduled because VM wasn't scheduled by host machine.
.Sh DIAGNOSTICS
.Ex -std
.Sh AUTHORS
The
.Nm
utility was written by
.An Jan Friesse Aq jfriesse@redhat.com .
.Sh BUGS
.Bl -dash
.It
OS is not updating steal time as often as monotonic clock. This means that steal
time difference can be (and very often is) bigger than monotonic clock difference,
so steal time percentage can be bigger than 100%. It's happening very often for
highly overloaded host machine when
.Nm
is called with small timeout. This problem is even bigger when VMGuestLib is used.
.It
VMGuestLib seems to randomly block.
.El
